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Abstract 



Summary: 



Founded upon diffusion with damping, ITM Probe is an application for modelling information flow in protein inter- 
action networks without prior restriction to the sub-network of interest. Given a context consisting of desired origins 
and destinations of information, ITM Probe returns the set of most relevant proteins with weights and a graphical rep- 
resentation of the corresponding sub-network. With a click, the user may send the resulting protein list for enrichment 
analysis to facilitate hypothesis formation or confirmation. 



Availability: 



ITM Probe web service and documentation can be found at[ 



| www.ncbi.nlm.nih.gov/CBBresearch/qmbp/mn/itm_probel 



Contact: 



yyu@ncbi.nlm.nih.gov 



*to whom correspondence should be addressed 



1 Introduction 



Protein interaction networks are presently under intensive research dBader et a/i 120081) . Recently, a number of authors 
have applied the con cept of random walk (with truncation') to extract biologica lly relevant information from protein 



interaction networks (iNabieva et al 



2005 



Tu et al. 



2006; 



Suthram et al. 



20081) . These approaches, however, do not 



model information loss/leakage that naturally occurs in all networks. For example, in cellular networks, proteases 
constantly degrade proteins, diminishing the strength of information propagation. We have recently developed a 
mathematica l framework to model infor mation flow in interaction networks with a novel ingredient, damping/aging of 
information (IStoimirovic and Yul 12007). Implementing the theory, we have constructed a web application ITM Probe, 
which also contains a new model of information propagation: information channel. 

ITM Probe models information flow in a protein interaction network through discrete random walks. Unlike classical 
random walks, our model allows the walker a certain probability to dissipate or damp (that is, to leave the network) 
at each step. Each walk, simulating a possible information path, terminates either by dissipation or by reaching a 
boundary node. 

We distinguish two types of boundary nodes: sources (emitting information) and sinks (absorbing information). 
ITM Probe offers three models: absorbing, emitting and channel. For any network node, the corresponding weight 
returned by the emitting model is the expected number of visits to that node by a random walk originating at given 
source(s). The absorbing model, on the other hand, returns the likelihood of a random walk starting at that node to 
terminate at sink(s). The channel model combines the emitting and absorbing models: it contains both sources and 
sinks as boundary and reports the expected numbers of visits to any network node from random walks originating at 
sources and terminating at sinks. 

Each selection of boundary nodes and dissipation rates provides the biological context for the information trans- 
mission modelled. Small dissipation allows random walks to explore the nodes farther away from their origin while 
large dissipation evaporates quickly most walks. For the channel model, dissipation controls how much a random walk 
can deviate from the shortest path from sources to sinks. We call the set of most significant nodes, in terms of the 
weights returned, an Information Transduction Module (ITM). 



2 Usage 

Both the absorbing and emitting models navigate neighborhoods of selected nodes and illuminate the protein com- 
plexes associated with them. However, the absorbing model can reveal relatively distant 'leaf nodes linked to a sink 
by a nearly unique path, while the emitting model favors highly connected clusters. The channel model is suited for 
discovery of potential pathways linking proteins of interest or biological functions associated with them. Using mul- 
tiple sources may reveal the potential points of crosstalk between information channels, while a solution of multiple 
sinks chosen according to a set of competing hypotheses may suggest the most biologically plausible pathways among 
many possible ones. 

Every model of ITM Probe requires an interaction graph, the boundary nodes (sources and/or sinks) and the damp- 
ing factors as input. The damping factors may be specified directly or by setting the desired average path-length 
(emitting/channel model) or the average likelihood of absorption at sinks (absorbing model). 

Although our mathematical framework can be applied to any directed graph, our web service presently supports 
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only the yeast (Saccharomyces cerevisiae) physical interaction networks derived from the BioGRID dStark et a/.l l2006) 
database. We offer three yeast networks: Full, Reduced and Directed. The Full network consists of all interactions 
from the BioGRID as an undirected graph, while the Reduced consists only of those interactions that are from low- 
throughput experiments (that is, from publications reporting less than 300 interactions) or are reported by at least 
two independent publications. The Directed network is derived from Reduced by turning all interactions labelled as 
'Biochemical activity' into directed links (bait — > prey). 

To assist in silico investigations on the impact of kno cking out certain gen es, ITM Probe allows users to specify nodes 
to exclude from the network. Furthermore, it is known (ISteffen et a/.l 120021) that proteins with a large number of non- 
specific interaction partners might overtake the true signaling proteins in the information flow modeling. Therefore, 
ITM Probe by default excludes from the yeast networks the proteins that may provide undesirable shortcuts, such as 
cytoskeleton proteins, histones and chaperones. The user may choose to lengthen or shorten this list. 



Output and analysis 

ITM Probe outputs a list of the top ranking nodes togeth er with an image of the sub -network consisting of these nodes 
(Fig- [0- Images are produced using the Graphviz suite dGansner and Nor th. 2000). Each protein listed is linked to its 
full description in several external databases. The number of nodes to be listed can be specified d irectly by the user or 
determ ined automatically from the model results through a criterion such as participation ratio ( Stojmirovic and Yu . 
2007) or the cutoff value. The resulting weights for all nodes can be downloaded in the CSV format for further 
analysis. 

Each ITM image can be rendered and saved in multiple formats (SVG, PNG, JPEG, EPS and PDF). For each 
rendering, the users can choose which aspects of results to display, the color map and the scale for presentation 
(linear or logarithmic). When multiple boundary points are specified, it is possible to obtain an overview of all of their 
contributions simultaneously by selecting the color mixture scheme (Fig.Q]). In this case, each source (channel/emitting 
model) or sink (absorbing model) is assigned a basic CMY (cyan, magenta or yellow) color and the coloring of each 
displayed node is a result of mixing the colors corresponding to its source- or sink- specific values for each of the 
boundary points. 

While it is possible to specify any proteins in the network as sources and sinks, not every context produces bio- 
logically meaningful res ults. To facilitate biolo gical interpretation of the users' results, we hav e locally implemen ted 



Boyle et al. 



(2004j). It 



a Gene Ontology (GO) dAshburner et aZ.ll2000i) enrichment tool based on GO::TermFinder of] 
compares a given input list of proteins to the lists annotated with GO terms and finds those GO terms that statistically 
best explain the input list. Every ITM Probe results page contains a query form allowing the user to specify the number 
of the top ranking proteins to consider for GO term enrichment analysis. 



Example 

Histone acetyltransferases remodel chromatin by ac etylating histone octamers and hence may play an important role 
in transcription activation dSterner and Bergen, uOOOl) . To explore the interface between them and the RNA Polymerase 
II core in yeast, we choose three histone acetyltransferases (Hatlp, Gcn5p, Elp3p) as sources and a catalytic subunit 
Rpo21p of RNA Polymerase II as a sink for the channel model (Fig. Q]). From the color mixing image it appears 
that Elp3p and Gcn5p interact with Rpo21p through a wide channel of proteins, while Hatlp seems to be remote 
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Figure 1 : An example ITM from running the ITM Probe channel model. 

from Rpo21p. This prompts the hypothesis that Hatlp is not directly involved in transcription activation. Enrichment 
analysis, using the 16 nodes (shown in magenta color in Fig. [TJ mostly visited from Hatlp, shows that Hatlp and 
these nodes participate mainly in DNA replication and only indirectly in transcription regulation, thus reinforcing the 
hypothesis. Similar analysis on the nodes associated with Elp3p indicates the interaction is almost exclusively through 
the elongator complex. The nodes associated with Gcn5p are less specific, indicating a more generic interface, but are 
all involved mRNA transcription. 

3 Outlook 

We plan to include interaction networks from additional organisms, once their coverage/quality becomes comparable 
to those from yeast. In principle, the analysis from ITM Probe can be integrated with existing partial knowledge to form 
a broad picture of possible communication paths in cellular processes. The concept of context-specific analysis may 
find applications beyond biological networks. 
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